344 research outputs found

    Low Latency Geo-distributed Data Analytics

    Full text link
    Low latency analytics on geographically distributed dat-asets (across datacenters, edge clusters) is an upcoming and increasingly important challenge. The dominant approach of aggregating all the data to a single data-center significantly inflates the timeliness of analytics. At the same time, running queries over geo-distributed inputs using the current intra-DC analytics frameworks also leads to high query response times because these frameworks cannot cope with the relatively low and variable capacity of WAN links. We present Iridium, a system for low latency geo-distri-buted analytics. Iridium achieves low query response times by optimizing placement of both data and tasks of the queries. The joint data and task placement op-timization, however, is intractable. Therefore, Iridium uses an online heuristic to redistribute datasets among the sites prior to queries ’ arrivals, and places the tasks to reduce network bottlenecks during the query’s ex-ecution. Finally, it also contains a knob to budget WAN usage. Evaluation across eight worldwide EC2 re-gions using production queries show that Iridium speeds up queries by 3 × − 19 × and lowers WAN usage by 15% − 64 % compared to existing baselines

    Learning-aided Stochastic Network Optimization with Imperfect State Prediction

    Full text link
    We investigate the problem of stochastic network optimization in the presence of imperfect state prediction and non-stationarity. Based on a novel distribution-accuracy curve prediction model, we develop the predictive learning-aided control (PLC) algorithm, which jointly utilizes historic and predicted network state information for decision making. PLC is an online algorithm that requires zero a-prior system statistical information, and consists of three key components, namely sequential distribution estimation and change detection, dual learning, and online queue-based control. Specifically, we show that PLC simultaneously achieves good long-term performance, short-term queue size reduction, accurate change detection, and fast algorithm convergence. In particular, for stationary networks, PLC achieves a near-optimal [O(ϵ)[O(\epsilon), O(log(1/ϵ)2)]O(\log(1/\epsilon)^2)] utility-delay tradeoff. For non-stationary networks, \plc{} obtains an [O(ϵ),O(log2(1/ϵ)[O(\epsilon), O(\log^2(1/\epsilon) +min(ϵc/21,ew/ϵ))]+ \min(\epsilon^{c/2-1}, e_w/\epsilon))] utility-backlog tradeoff for distributions that last Θ(max(ϵc,ew2)ϵ1+a)\Theta(\frac{\max(\epsilon^{-c}, e_w^{-2})}{\epsilon^{1+a}}) time, where ewe_w is the prediction accuracy and a=Θ(1)>0a=\Theta(1)>0 is a constant (the Backpressue algorithm \cite{neelynowbook} requires an O(ϵ2)O(\epsilon^{-2}) length for the same utility performance with a larger backlog). Moreover, PLC detects distribution change O(w)O(w) slots faster with high probability (ww is the prediction size) and achieves an O(min(ϵ1+c/2,ew/ϵ)+log2(1/ϵ))O(\min(\epsilon^{-1+c/2}, e_w/\epsilon)+\log^2(1/\epsilon)) convergence time. Our results demonstrate that state prediction (even imperfect) can help (i) achieve faster detection and convergence, and (ii) obtain better utility-delay tradeoffs

    Low latency via redundancy

    Full text link
    Low latency is critical for interactive networked applications. But while we know how to scale systems to increase capacity, reducing latency --- especially the tail of the latency distribution --- can be much more difficult. In this paper, we argue that the use of redundancy is an effective way to convert extra capacity into reduced latency. By initiating redundant operations across diverse resources and using the first result which completes, redundancy improves a system's latency even under exceptional conditions. We study the tradeoff with added system utilization, characterizing the situations in which replicating all tasks reduces mean latency. We then demonstrate empirically that replicating all operations can result in significant mean and tail latency reduction in real-world systems including DNS queries, database servers, and packet forwarding within networks

    Tachyon: Reliable, Memory Speed Storage for Cluster Computing Frameworks

    Get PDF
    Tachyon is a distributed file system enabling reliable data sharing at memory speed across cluster computing frameworks. While caching today improves read workloads, writes are either network or disk bound, as replication is used for fault-tolerance. Tachyon eliminates this bottleneck by pushing lineage, a well-known technique, into the storage layer. The key challenge in making a long-running lineage-based storage system is timely data recovery in case of failures. Tachyon addresses this issue by introducing a checkpointing algorithm that guarantees bounded recovery cost and resource allocation strategies for recomputation under commonly used resource schedulers. Our evaluation shows that Tachyon outperforms in-memory HDFS by 110x for writes. It also improves the end-to-end latency of a realistic workflow by 4x. Tachyon is open source and is deployed at multiple companies.National Science Foundation (U.S.) (CISE Expeditions Award CCF-1139158)Lawrence Berkeley National Laboratory (Award 7076018)United States. Defense Advanced Research Projects Agency (XData Award FA8750-12-2-0331

    Online Convex Optimization Using Predictions

    Get PDF
    Making use of predictions is a crucial, but under-explored, area of online algorithms. This paper studies a class of online optimization problems where we have external noisy predictions available. We propose a stochastic prediction error model that generalizes prior models in the learning and stochastic control communities, incorporates correlation among prediction errors, and captures the fact that predictions improve as time passes. We prove that achieving sublinear regret and constant competitive ratio for online algorithms requires the use of an unbounded prediction window in adversarial settings, but that under more realistic stochastic prediction error models it is possible to use Averaging Fixed Horizon Control (AFHC) to simultaneously achieve sublinear regret and constant competitive ratio in expectation using only a constant-sized prediction window. Furthermore, we show that the performance of AFHC is tightly concentrated around its mean

    Small nucleolar RNAs determine resistance to doxorubicin in human osteosarcoma

    Get PDF
    Doxorubicin (Dox) is one of the most important first-line drugs used in osteosarcoma therapy. Multiple and not fully clarified mechanisms, however, determine resistance to Dox. With the aim of identifying new markers associated with Dox-resistance, we found a global up-regulation of small nucleolar RNAs (snoRNAs) in human Dox-resistant osteosarcoma cells. We investigated if and how snoRNAs are linked to resistance. After RT-PCR validation of snoRNAs up-regulated in osteosarcoma cells with different degrees of resistance to Dox, we overexpressed them in Dox-sensitive cells. We then evaluated Dox cytotoxicity and changes in genes relevant for osteosarcoma pathogenesis by PCR arrays. SNORD3A, SNORA13 and SNORA28 reduced Dox-cytotoxicity when over-expressed in Dox-sensitive cells. In these cells, GADD45A and MYC were up-regulated, TOP2A was down-regulated. The same profile was detected in cells with acquired resistance to Dox. GADD45A/MYC-silencing and TOP2A-over-expression counteracted the resistance to Dox induced by snoRNAs. We reported for the first time that snoRNAs induce resistance to Dox in human osteosarcoma, by modulating the expression of genes involved in DNA damaging sensing, DNA repair, ribosome biogenesis, and proliferation. Targeting snoRNAs or down-stream genes may open new treatment perspectives in chemoresistant osteosarcomas

    Network algorithmics and the emergence of the cortical synaptic-weight distribution

    Full text link
    When a neuron fires and the resulting action potential travels down its axon toward other neurons' dendrites, the effect on each of those neurons is mediated by the weight of the synapse that separates it from the firing neuron. This weight, in turn, is affected by the postsynaptic neuron's response through a mechanism that is thought to underlie important processes such as learning and memory. Although of difficult quantification, cortical synaptic weights have been found to obey a long-tailed unimodal distribution peaking near the lowest values, thus confirming some of the predictive models built previously. These models are all causally local, in the sense that they refer to the situation in which a number of neurons all fire directly at the same postsynaptic neuron. Consequently, they necessarily embody assumptions regarding the generation of action potentials by the presynaptic neurons that have little biological interpretability. In this letter we introduce a network model of large groups of interconnected neurons and demonstrate, making none of the assumptions that characterize the causally local models, that its long-term behavior gives rise to a distribution of synaptic weights with the same properties that were experimentally observed. In our model the action potentials that create a neuron's input are, ultimately, the product of network-wide causal chains relating what happens at a neuron to the firings of others. Our model is then of a causally global nature and predicates the emergence of the synaptic-weight distribution on network structure and function. As such, it has the potential to become instrumental also in the study of other emergent cortical phenomena

    Kairos: Preemptive Data Center Scheduling Without Runtime Estimates

    Get PDF
    The vast majority of data center schedulers use task runtime estimates to improve the quality of their scheduling decisions. Knowledge about runtimes allows the schedulers, among other things, to achieve better load balance and to avoid head-of-line blocking. Obtaining accurate runtime estimates is, however, far from trivial, and erroneous estimates lead to sub-optimal scheduling decisions. Techniques to mitigate the effect of inaccurate estimates have shown some success, but the fundamental problem remains. This paper presents Kairos, a novel data center scheduler that assumes no prior information on task runtimes. Kairos introduces a distributed approximation of the Least Attained Service (LAS) scheduling policy. Kairos consists of a centralized scheduler and per-node schedulers. The per-node schedulers implement LAS for tasks on their node, using preemption as necessary to avoid head-of-line blocking. The centralized scheduler distributes tasks among nodes in a manner that balances the load and imposes on each node a workload in which LAS provides favorable performance. We have implemented Kairos in YARN. We compare its performance against the YARN FIFO scheduler and Big-C, an open-source state-of-the-art YARN-based scheduler that also uses preemption. Compared to YARN FIFO, Kairos reduces the median job completion time by 73% and the 99th percentile by 30%. Compared to Big-C, the improvements are 37% for the median and 57% for the 99th percentile. We evaluate Kairos at scale by implementing it in the Eagle simulator and comparing its performance against Eagle. Kairos improves the 99th percentile of short job completion times by up to 55% for the Google trace and 85% for the Yahoo trace
    corecore